VOICE DATA TRANSMITTING AND RECEIVING SYSTEM 
BACKGROUND OF THE INVENTION 

This application claims benefit of Japanese Patent 
Application No. 2002-349621 filed on December 2, 2002, the 
5 contents of which are incorporated by the reference. 

The present invention relates to voice data 
transmitting and receiving system and, more particularly, 
a voice data transmitting and receiving system capable of 
securing meaning data via a communication path such as QoS 
10 (quality of services) non-guaranteed network, for instance 
an internet . 

As internet which is in common use across borders 
and all over the world, electronic mercantile transactions 
and internet telephone, i.e., IP (internet protocol) 

15 telephone are attracting attentions aside from such 

conventional applications as home page reading, electronic 
mails and file transfer. This is greatly attributable to 
rapid advancement of not only network centered on line 
exchange in telephone network but also IP network based 

20 on packet exchanges. 

In the IP telephone communication, various data 
including voice (or FAX) data and also data of still images 
and motion picture images are converted to IP packets to 
be transferred in IP base network. What is called internet 

25 telephone is the utilization, in part of or full network 
service, of the same IP network, i.e., communication network 
for communication in internet protocol) as those utilized 
for such applications as IP telephone and www therein by 
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voice telephone service utilizing IP network techniques. 

Among the IP telephone are the following three 
different systems. In a first one of these systems, voice 
messages are exchanged between personal computers which 
5 are dial-up interconnected on internet. In this system, 
it is necessary that the same software is installed in the 
personal computers, which are in turn connected to a server .. 
In a second system, communication can not be obtained unless 
a telephone call is provided from a personal computer to 

10 a usual subscribed telephone set (converse call being 
impossible) or prearrangements are made between the two 
sides. As a third system, two systems are present. In one 
of these systems, communication is made by inputting user 
ID and PIN via internet telephone gateway to a point of 

15 connection between an internet network for communication 
between usual subscriber's telephone sets and a public 
telephone line switchboard. The other system is one for 
communication between direct internet-coupled terminals. 
These systems are closest to the present telephone 

20 communication system, and their technical advancement is 
outstanding . 

In the meantime, a system for transmitting a great 
deal of voice data in a narrow band has been proposed, in 
which on the transmission side input voice is converted 
25 by voice recognition to character data, which are packeted 
and then transmitted, and on the reception side the received 
character data is converted to voice data, followed by voice 
synthesis and output of the resultant data as voice, thereby 
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greatly reducing the transmitted data quantity and avoiding 
the communication delay (see, for instance, Literature 1: 
Japanese patent laid-open Hei 10-285275) . This system, 
however, although it has an advantage of reducing 
5 transmitted data quantity, it is based on character data 
transfer. Therefore, the voice obtained by the synthesis 
has a fixed character, and is different in character from 
the speaker's voice. 

By the way, in IP voice communication via IP network 

10 such as internet or local network without guaranteed QoS 
as communication quality, usually RTPs of UDP protocol are 
used for transmission and reception of voice data. In this 
case, although RTPs are used with importance attached to 
the real-time property of data in voice communication and 

15 motion picture playback, for the RTP no measure is provided 
against packet loss occurring on the communication path, 
and lost packets are not re-transferred, thus posing 
problems in the voice quality such as interruptions of voice . 

To cope with these problems ,. heretofore, a system 

20 has been proposed, in which RTPs are transmitted together 
with preceding and succeeding packet data for an 
interpolating process according thereto, so that the voice 
will not be interrupted even in a packet loss event . However, 
in an environment, in which data communication other than 

25 voice is frequently present, voice packet loss is pronounced, 
and the voice quality deterioration is too significant even 
by using the interpolation, sometimes resulting in failure 
of recognizing the meaning of the speech. 
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As shown above, the real-time voice communication 
by packet transmission, is subject to missing of RTPs due 
to deterioration of the communication path environment, 
thus resulting in worm-eaten parts of voice. Heretofore, 
5 satisfactory communication could be obtained only in good 
communication environments . 
SUMMARY OF THE INVENTION 

An object of the present invention, accordingly, is 
to a voice data transmitting and receiving system capable 
10 of recognizing the meaning of speech even in a deteriorated 
communication path environment. 

Another object of the present invention is to provide 
a voice data transmitting and receiving system capable of 
recognizing the meaning of speech irrespective of packet 
15 missing due to causes in the communication path. 

According to a first aspect of the present invention, 
there is provided a voice data transmitting and receiving 
system for transmitting and receiving voice data as packet 
data via a network, wherein: on the transmission side voice 
20 clauses are divided and transmitted as packet data in divided 
clause units, and on the reception side the voice data is 
outputted as voice based on the received packet data in 
clause units. 

According to a second aspect of the present invention, 
25 there is provided a voice data transmitting and receiving 
system, wherein: on the transmission side: real-time 
communication packets are generated based on input voice 
data; the input voice data is divided into clause units; 
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and a plurality of RTP voice data in the clause units are 
transferred as packet data to a communication path; and 
on the reception side: packet data in clause units are 
obtained from packeted received data received via the 
5 communication path, thereby producing a replica of the RTPs 
in clause units; and outputting the voice data as voice 
based on the replica of the RTPs. 

According to a third aspect of the present invention, 
there is provided a voice data transmitting and receiving 

10 system, wherein; on the transmission side: real-time 

communication packets are generated based on input voice 
data; the input voice data is divided off into clause units; 
and a plurality of voice data RTPs in the clause units are 
combined into a single packet data and transferred to a 

15 communication path; and on the reception side: packet data 
in clause units are obtained from packeted received data 
received via the communication path, thereby producing a 
replica of the RTPs in clause units; and the voice data 
is outputted as voice based on the plurality of RTPs. 

20 The data sent out from the transmission side is in 

the form of a file. On the transmission side either a 
re-transfer request is provided by recognizing missing of 
received data or an interpolation process on the received 
data is executed based on the received file data. The file 

25 data sent out from the transmission side is provided with 
discrimination data. In the reception, transmission side 
data is taken out from the received file data based on the 
discrimination data. The voice is divided into clauses 
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based on voice recognition. The voice is divided into 
clauses based on an externally provided instruction. The 
voice is divided into clauses based on the sound level of 
the input voice . The voice is divided off into clauses based 
5 on changes in the input voice pitch level. The voice is 
divided off into clauses based on measured movement of the 
user's lips. The voice is divided off into clauses based 
on measured vibrations of the user's throat. The systems 
are selected based on the extent of communication per unit 

10 time between the transmission and reception sides. 

'According to a fourth aspect of the present invention, 
there is provided a voice data transmitting and receiving 
method as packet data via a network, wherein voice clauses 
are divided and transmitted as packet data in divided clause 

15 units in a transmission side, and the voice data is outputted 
as voice based on the received packet data in clause units 
in a receipt side. 

According to a fifth aspect of the present invention, 
there is provided a voice data transmitting and receiving 

20 method, wherein: real-time communication packets are 

generated based on input voice data, the input voice data 
is divided into clause units and a plurality of RTP voice 
data in the clause units are transferred as packet data 
to a communication path in a transmission side; and packet 

25 data in clause units are obtained from packeted received 
data received for producing a replica of the RTPs in clause 
units; and the voice data is outputted as voice based on 
the replica of the RTPs in a receipt side. 
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According to a sixth aspect of the present invention, 
there is provided a voice data transmitting and receiving 
method, wherein; real-time communication packets are 
generated based on input voice data, the input voice data 
5 is divided off into clause units and a plurality of voice 
data RTPs in the clause units are combined into a single 
packet data and transferred to a communication path in a 
transmission side; and packet data in clause units are 
obtained f rompacketed received data for producing a replica 
10 of the RTPs in clause units and the voice data is outputted 
as voice based on the plurality of RTPs. 

Other objects and features will be clarified from 
the following description with reference to attached 
drawings . 

15 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 is a system structure of a voice data 
transmitting and receiving system of a first embodiment 
according to the present invention; 

Fig, 2 is a system structure of a voice data 
20 transmitting and receiving system of a second embodiment 
according to the present invention; 

Fig. 3 is a system structure of a voice data 
transmitting and receiving system of a third embodiment 
according to the present invention; 
25 Fig. 4 is a system structure of a voice data 

transmitting and receiving system of a fourth embodiment 
according to the present invention; and 

Fig. 5 is a view for describing the operation of the 
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embodiment shown in Fig. 4. 

PREFERRED EMBODIMENTS OF THE INVENTION 

Preferred embodiments of the present invention will 
now be described with reference to the drawings. 

Fig. 1 is a system structure of a voice data 
transmitting and receiving system of a first embodiment 
according to the present invention. In this embodiment, 
the transmission side comprises a communication terminal 
11, a voice recognizer unit 12 and a packet combine unit 
13, and the reception side which is connected to the 
transmission side via an internet or like communication 
channel, comprises a packet division unit 21 and a 
communication terminal 22 . While each user of course has 
both transmitting and receiving functions for the 
conversation purpose, in the following description the 
transmission and reception side are dealt with separately. 

On the transmission side, user's voice inputted to 
a microphone or like voice input device is processed as 
voice data in a communication terminal 11.. On the reception 
side, a communication terminal 22 processes the voice, and 
outputs the processed voice via a loudspeaker or like voice 
output device. 

On the transmission side, the communication terminal 
11 generates real-time communication packets (hereinafter 
abbreviated as RTP) based on the input voice data . The voice 
recognizer unit 12 receives the voice data from the 
communication terminal 11, and executes a voice recognition 
process to divide off the voice into clause units. The 
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packet combine unit 13 combines a plurality of voice data 
RTPs in clause units from the voice recognizer unit 12 into 
a single packet data to be sent out to a communication path. 
The packet combine unit 13 may send out the voice data RTPs 
5 in clause units as such. 

On the reception side, the packet division unit 21 
executes packet division of packeted received data received 
via the communication path to obtain RTPs of voice data 
in clause units, thus producing replica of a plurality of 
10 RTPs as clause units. The communication terminal 22 

reproduces the transmission side voice data based on the 
plurality of RTPs received from the packet division unit 
21. 

As shown above, in this embodiment clause units as 
15 divisions having means of voice composition are 

discriminated for transmission and reception as real-time 

communication packets in the discriminated clause units. 

Thus, even when packet missing occurs on the communication 

path due to deterioration of the communication environment 
20 due to such cause as communication line deterioration, the 

meaning of each clause can be transmitted, and reliable 

data transfer is possible. 

A second embodiment of the voice data 

transmitting/receiving system according to the present 
25 invention will now be described with reference to the block 

diagram of Fig. 2. In Fig. 2, parts having functions like 

those in the case of Fig. 1 are designated by like reference 

numerals . 
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In this embodiment, the transmission side comprises 
a communication terminal 11, a voice recognizer unit 12, 
a packet combine unit 13 and file producer (filing) unit 
14, and the reception side which is connected to the 
transmission side via an internet or like communication 
path comprises a packet division unit 21, a communication 
terminal 22 and a file receiver unit 23. 

On the transmission side, the communication terminal 
11 generates RTPs based on the input voice data. The voice 
recognizer unit 12 executes a voice recognition process 
on the' voice data from the communication terminal 11 to 
divide off voice into clause units . The packet combine unit 
13 combines a plurality of voice data RTPs in clause units 
to produce a single packet data to be sent out to the file 
producer unit 14. The file producer unit 14 produces as 
file of the receive packets, and sends out the file to the 
communication path. 

On the reception side, the file receiver unit 23 
receives the file data received via the communication path, 
and sends out the received file data as packet data to the 
packet division unit 21. The file receiver unit 23 also 
recognizes missing, if any, of received data from the 
received file data in order to send out data re-transfer 
request to the transmission side or execute an interpolating 
process on the received data so as to prevent missing of 
data . 

The packet division unit 21 executes packet division 
of data received from the file receiver unit 23 to obtain 
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the voice data RTPs in clause units and reproduce a replica 
of a plurality of RTPs as a single clause . The communication 
terminal 22 generates transmission side voice data from 
the plurality of RTPs received from the packet division 
unit 21, and causes the generated data to be outputted as 
voice from the loudspeaker. 

In the above second embodiment, in addition to the 
advantage obtainable with the previous first embodiment 
that, transfer of the meaning of each clause and also reliable 
data transfer are obtainable even in the event of packet 
missing occurrence of the communication path due to 
deterioration of the communication environment stemming 
from such cause as communication line deterioration, it 
is possible to recognize missing of received data on the 
basis of the file data received from file receiver unit 
23 so as to send out a data re-transfer request or prevent 
missing of data through an interpolating process on the 
received data . 

. A third embodiment of the voice data 
transmitting/receiving system according to the present 
invention will now be descried with reference to the block 
diagram of Fig. 3. In Fig. 3, parts having functions like 
those in the case of Fig. 2 are designated by like reference 
numerals . 

This embodiment is basically the same in arrangement 
and operation with the above second embodiment shown in 
Fig. 2. This embodiment is greatly effective in case where 
a fire wall 24 is provided between the communication path 
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and the reception side. In this embodiment, the file 
producer unit 14 sends out the file data by making use of 
a generally open port such as HTTP and FTP, and in order 
for discrimination from any other file, discrimination data 
is provided after the file production. 

On the reception side, the file receiver unit 23 which 
is connected via an internet or like communication path, 
takes out a file transmitted from the file producer unit 
14 form the full received file on the basis of the 
discrimination data, and sends out the taken-out file to 
the pa'cket division unit 21. The file receiver unit 23, 
like the above case, recognizes missing of received data 
and sends out a data re-transfer instruction to prevent 
missing of data through an interpolation process of the 
received data. 

In the third embodiment, in addition to the advantage 
obtainable in the first and second embodiments that transfer 
the meaning of each clause and also reliable data transfer 
is obtainable irrespective of packet missing on the 
communication path due to deterioration of the communication 
environment stemming from such cause as communication line 
deterioration, and that the data missing can be prevented 
by a data re-transfer process and a received data 
interpolation process based on the recognition of received 
data missing based on the received file data, it is possible 
to obtain communication with communication terminal over 
the fire wall . 

Further embodiments of the present invention will 
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now be described, which are different forms of voice clause 
separation (or discrimination) system. 

In a fourth embodiment of the present invention, 
signals representing manual clause divisions are outputted . 
5 it is thus possible to input necessary divisions with 
person 1 s judgment by using a manual clause division devices . 

With this embodiment, division data can be inputted 
in any environment. Thus, the embodiment can be used not 
only for voice but also for music and continuous tones. 
10 Furthermore, the embodiment can be used for other RTP 
communications such as image communication. 

In a fifth embodiment of the present invention, voice 
clause divisions are determined based on the measured input 
sound level. More specifically, the inputted sound level 
15 is measured, and an instant when the measured level is reduced 
down to a particular level, is determined to be a division 
or an of f -division . The particular value in this case may 
be the noise level when the utterance comes to a pause. 

This embodiment permits automatically dividing 
20 clauses at natural divisions in the utterance. 

In a sixth embodiment of the present invention, voice 
clause divisions are determined based on the measured input 
sound pitch. Specifically, the input sound pitch is 
measured, and an instant when a pitch difference exceeds 
25 a constant value is determined to be a division. 

This embodiment permits automatically 
discriminating off-divisions of utterance irrespective of 
whether the background noise level is high. 
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In a seventh embodiment of the present invention, 
voice clause off-divisions are determined based on the 
movement of lips by making image measurement of the face 
of a person during voice input. In other words, image 
5 measurement of a person during voice input is made, and 
an instant when the movement of the lips becomes stagnant 
is determined to be a division. 

With this embodiment, divisions are determined with 
a mechanism different from the voice process, and it is 
10 thus possible to automatically discriminate divisions 
without any appropriate voice discriminator. 

In an eighth embodiment of the present invention, 
voice clause divisions are determined based on the measured 
vibrations of the throat. Specifically, vibrations of the 
15 throat are measured, and an instant when the vibration is 
stopped is determined to be the division. 

With this embodiment, divisions-of f are determined 
with a mechanism different from the voice process, and it 
is thus possible to automatically discriminate 
20 off-divisions without any appropriate voice discrimination. 
The embodiment can also be used in the case of extremely 
low voice level. 

In a ninth embodiment of the present invention, voice 
clause off-divisions are determined by a method of 
25 discrimination and analysis of voice as compositions. 

Specifically, voice is analyzed as compositions, and proper 
off-divisions are determined. As techniques of analysis 
of voice to compositions, well-known techniques may be used. 
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With this embodiment, it is possible to automatically 
determine of f -divisions f rommeanings even in an environment, 
in which the above method can not be utilized, for instance 
with flat and long continuous voice. 
5 A tenth embodiment of the present invention will now 

be described. 

This embodiment is installed on the transmission side 
(or reception side) , and an optimum communication means 
is informed to both the transmitting and receiving 
10 communication terminals by observing the communication 
status. 

Fig. 4 is a block diagram showing the present 
embodiment . 

This embodiment comprises a transmission/reception 
15 monitor unit 31 for sensing the start and end of communication, 
a communication time storage unit 32 for accumulating 
communication time, a communication extent storage unit 
33 for accumulating the quantity of transmitted or received 
data, a reference value/corresponding means storage unit 
20 34 for storing reference values for switching communication 
means and also these communication means, a comparative 
computer unit 35 for calculating the communication extent 
from the outputs of the communication time storage unit 
32 and the communication extent storage unit 33, comparing 
25 the calculated value with the reference values stored in 
the reference value/corresponding means storage unit 34, 
and a communication means informing unit 36 for receiving 
a communication means from the comparative computer means 
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35 and commanding the switching to the received 
communication means. 

The operation of the embodiment will now be described 
with reference to Fig. 5. 
5 When the communication is started, the 

transmission/reception monitoring unit 31 senses the start 
for communication, and causes the communication time storage 
unit 32 and the communication extent storage unit 33 to 
start accumulations, respectively. Whenever a constant 

10 time passes, the data stored in the communication time 
storage unit 32and the communication extent storage unit 
33 are sent out to the comparative computer unit 35, while 
the accumulated data in the communication time storage unit 
32 and the communication extent storage unit 33 are deleted. 

15 The comparative computer unit 35 computes the extent of 
communication per unit time from the data sent out from 
the communication time storage unit 32 and the communication 
extent storage unit 33, compares the result of calculation 
with the reference values stored in the reference 

20 value/corresponding means storage unit 34, and sends out 
the data of the corresponding communication means to the 
communication means informing unit 36. The communication 
means informing unit 36 sends out a command for switching 
to the selected communication means to the communication 

25 terminal. When the communication is ended, the 

transmission/reception monitoring unit 31 senses the end 
of accumulations, and notifies the end of accumulations 
and deletion of the stored values to the communication time 
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storage unit 32 and the communication extent storage unit 
33. As shown, in this embodiment the above systems can be 
selectively used based on the extent of communication per 
unit time between the transmission and reception sides. 
5 With this embodiment, the transmission/reception can 

do communication with optimum communication means matched 
to the environment of the communication path. As examples 
of the communication means, the RTP communication may be 
normally selected, the clause division packet communication 

10 may be selected in a bad communication path environment, 
and the file production communication may be selected in 
the worse communication path environment. 

The arrangements and operations of the preferred 
embodiments have been described above. However, these 

15 embodiments are merely examples of the present invention 
and are by no means limitative. It will now be readily 
understood to the person skilled in the art that various 
changes and modifications are possible in dependence on 
particular uses without departing from the scope of the 

20 present invention. 

As has been described in the foregoing, with the voice 
data transmission/reception systemaccording to the present 
invention not only it is possible to obtain transfer of 
the meaning of each clause and reliable data transfer even 

25 when missing of packets occurs on the communication path 
due to deterioration of communication environments stemming 
from such cause as communication line deterioration, but 
also it is possible to send out data re-transfer request 
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by recognizing missing of received data and prevent data 
missing by a received data interpolation process, thus 
permitting communication with communication terminal in 
excess of fire wall. 

Furthermore, the transmitting and receiving 
communication terminals can do communication with proper 
communication means by matching the environment of the 
communication path. 

Changes in construction will occur to those skilled 
in the art and various apparently different modifications 
and embodiments may be made without departing from the scope 
of the present invention. The matter set forth in the 
foregoing description and accompanying drawings is offered 
by way of illustration only. It is therefore intended that 
the foregoing descriptionbe regardedas illustrative rather 
than limiting. 
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